![]() |
|
![]() |
| I'm more interested in technical side of this, but I'm not seeing any links to GitHub with the source code of this project.
Anyway, I have a tangential question, and this is the first time I see langchain, so may be a stupid one. The point is the vendor-API seems to be far less uniform than what I'd expect from a framework like this. I'm wondering, why cannot[0] this be done with Ollama? Isn't it ultimately just system prompt, user input and a few additional params like temperature all these APIs require as an input? I'm a bit lost in this chain of wrappers around other wrappers, especially when we are talking about services that host many models themselves (like together.xyz), and I don't even fully get the role langchain plays here. I mean, in the end, all that any of these models does is just repeatedly guessing the next token, isn't it? So there may be a difference on the very low-level, there my be some difference on a high level (considering different ways these models have been trained? I have no idea), but on some "mid-level" isn't all of this utlimately just the same thing? Why are these wrappers so diverse and so complicated then? Is there some more novice-friendly tutorial explaining these concepts? [0] https://python.langchain.com/v0.2/docs/integrations/chat/ |
![]() |
| Yeah langchain is not necessary for this. The author appear not to have shared his code yet (too bad, the visualizations are nice!), but as a poor replacement I can share mine from over a year ago:
https://github.com/m3at/hn_jobs_gpt_etl Only using the plain OpenAI api. This was on GPT-3.5, but it should be easy to move to 4o and make use of the json mode. I might try a quick update this weekend |
![]() |
| I've been working on similar functionality for jsonresume -> https://github.com/jsonresume/jsonresume.org/blob/master/app...
What the author could have done, and what I should have (but didn't) also, is add a bunch of possible values (enums) for each possible field value. This should solve it from coming up with variations e.g. node, nodejs In zod/tooling it would look like this; remote: z.enum(['none', 'hybrid', 'full']), framework: z.enum(['nodejs', 'rails']), But this just shifts the problem further down, which is now you need a good standard set of possible values. Which I am yet to find, but I'm sure it is out there. On top of that, I am working on publishing a JobDescription.schema.json such that the next time the models train, they will internalize an already predefined schema which should make it a lot easier to get consistent values from job descriptions. - Also I tend to forget to do it a lot recently in LLM days but there are plenty of good NER (Named Entity Recognition) tools out there these days, that you should run first before making robust prompts |
![]() |
| Yeah, I guess it is pretty mainstream :) Though another view I keep hearing is that LLMs are going to replace all jobs in 5 years, which quite frankly is disconnected from reality. Even assuming that we could create a ML model that could replace a human (which I think the current paradigm is insufficient after this[0] discussion), there's still the matter of building data centers, manufacturing chips, and it would need to be cheaper than paying a human.
I personally would like AI to help humans be better humans, not try to replace humans. Instead of using AI to create less understanding with black box processes, I'd rather it help us with introspection and search (I think embeddings[1] is still a pretty killer feature that I haven't heard much noise about yet). [0] https://news.ycombinator.com/item?id=21786547 [1] https://terminusdb.com/blog/vector-database-and-vector-embed... |
![]() |
| Check the revisions from BLS over the past year. BLS revised down both April and May numbers by 30%. These are big revisions. This is a pattern now. Media only reports the initial figures. |
![]() |
| Interesting data, but I think the percentage of remote listings is misleading. Many “remote” jobs now require you to live within commuting distance of a particular city, usually SF or NY. |
![]() |
| Another thing to improve this, is to ask posters to add GLOBAL_REMOTE, COUNTRY_REMOTE or something that indicates is not local remote only (within the same country). |
![]() |
| I wonder how this would compare against a random sample of jobs on, say, Indeed or LinkedIn. My experience of Hacker News is that it’s a very biased group (in a good way) to the general industry. |
![]() |
| Really cool.
I’d love to see a similar analysis to “Who Wants to be Hired”. What trends exist in folks struggling to find work? That can help point people to how to target their career growth. |
![]() |
| It would be interesting to run this same analysis using Claude 3 Haiku, which is 1/40th of the price of GPT-4o. My hunch is that the results would be very similar for a fraction of the price. |
![]() |
| I was thinking the same, would love to hear the author's reasoning for going with gpt-4o. In my experience, anything above gpt-3.5-turbo is overkill for data extraction. |
![]() |
| Beautiful analysis! Great to see the hard stats on the technology breakdowns on the hiring threads with a clever LLM approach. And the write up was super clear. |
![]() |
| Given the high price due to high token count, I wonder how different the results would be running the same analysis with a local model. |
![]() |
| Surprised to see Redux featured so prominent in JS frameworks section, since it is so often criticized while many praise newer competitors like Zustand. |
![]() |
| > Using Selenium, I used a script to google iteratively for strings query = f"ask hn who is hiring {month} {year}" to get the IDs of the items that represent the monthly threads.
FYI, you could've just used the hackernews API, and get all posts by the user `whoishiring`, which submits all these who is hiring posts. And then filter out only the posts where the title starts with "Ask HN: Who is hiring?", as this bot also submits the 'Who wants to be hired?' and 'Freelancer? Seeking Freelancer?' posts. https://hacker-news.firebaseio.com/v0/user/whoishiring.json?... |
![]() |
| Read the second part of what they said
> it's hard to verify without code because there are so many false positives for "go" on any post. Hence grep being insufficient in this case. |
NYC clearly doesn’t have the level of activity in this area that the Bay does, but there’s a scene. LeCun and the NYU crowd and the big FAIR footprint create a certain gravity. There’s stuff going on :)